CNG Text Classification for Authorship Profiling Task Notebook for PAN at CLEF 2013

نویسندگان

  • Magdalena Jankowska
  • Vlado Keselj
  • Evangelos E. Milios
چکیده

We describe our participation in the Author Profiling task of the PAN 2013 competition. The task objective is to determine the age and the gender of an author of a document. We applied the Common N-Gram (CNG) classifier (Kešelj et al., 2003) to this task. The CNG classifier uses a dissimilarity measure based on the differences in the frequencies of the character n-grams that are most common in the considered documents. To train the classifier, a class is represented by one class document created by concatenating the training documents belonging to the class. A sample document is labelled by the class with the minimum dissimilarity. For the six class classification (combinations of two possible gender labels and three possible age labels) we achieved the accuracy of 0.2814 on the English test dataset and 0.2592 on the Spanish test dataset. Our results are below the medians of the results of the competition participants.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Basic Character N-gram Approach to Authorship Verification Notebook for PAN at CLEF 2013

This paper describes our approach to the Author Identification task in the PAN 2013 evaluation lab. We use a profile-based approach and use the common n-grams (CNG) method that employs a normalized distance measure for short and unbalanced text introduced by Stamatatos[6]. We achieved the 9th place with an overall F1 score of 0.6.

متن کامل

Proximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook for PAN at CLEF 2013

We describe our participation in the Author Identification task of the PAN 2013 competition. This competition task presents participants with a set of authorship verification problems. In each such a problem, one is given a set of documents written by one author and a sample document; the task is to answer the question whether or not the sample document was written by the same author as the rem...

متن کامل

Author Profiling for English and Spanish Text Notebook for PAN at CLEF 2013

This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text. For each modality, we extract informa...

متن کامل

Readability for Author Profiling? Notebook for PAN at CLEF 2013

This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.

متن کامل

Author Profiling using LDA and Maximum Entropy Notebook for PAN at CLEF 2013

This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation[LDA]. We used the content based features like topics and style based features like preposition-frequencies, which act as the eff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013